(navigation image)
Home American Libraries | Canadian Libraries | Universal Library | Community Texts | Project Gutenberg | Children's Library | Biodiversity Heritage Library | Additional Collections
Search: Advanced Search
Anonymous User (login or join us)
Upload
See other formats

Full text of "Determining wheather the contents of a data file are sufficent to fill a retrieval request"

WORKING PAPER 
ALFRED P. SLOAN SCHOOL OF MANAGEMENT 



DETERMINING WHETHER THE CONTENTS OF A DATA FILE 
ARE SUFFICIENT TO FILL A RETRIEVAL REQUEST 



Christopher R. Sprague 
March 1970 



44«-70 



'f 



MASSACHUSETTS 
INSTITUTE OF TECHNOLOGY 

50 MEMORIAL DRIVE 
"RIDGE, MASSACHUSETT 



OcWtY LIBRARY 



DETERMINING WHETHER THE CONTENTS OF A DATA FILE 
ARE SUFFICIENT TO FILL A RETRIEVAL REQUEST 



Christopher R. Sprague 
March 1970 



44«-70 



This paper is a direct extension of ideas developed in my S.M. Thesis. 
I am indebted to J. C. Emery and D. C. Carroll for their aid. 



HDA 
.nq<//(/ 



W 370 



We present a solution to a problem of considerable practical interest 
;jhich arises in the design of Management Information Systems. After pre- 
senting an example of the problem, we shall formalize it. 

Example : Suppose we have a personnel file containing complete data on 
all our 20,000 workers. The Operations Research Group, wishing to conduct 
a study of absenteeism among older workers, has created a much smaller file 
:ontaining the records of all those aged 60 or over as well as those of any 
age who were absent 10 or more days in the last year. Let us suppose that 
this file is 1500 records long. 

The personnel department now wishes to obtain the names of all employees 
rfithin 2 years of compulsory retirement at age 65. The desired list can be 
obtained either from the large file owned by the personnel department or from 
the smaller file owned by the Operations Research Group. Question: How 
•70uld the information system "know" that the list could be obtained from 
the shorter file (at some considerable saving)? 

Formalization : Let a "record" be a collection of integers k^ , . . . ,k. , . . . ,k 

[n general, the value of k. for any record implies nothing whatsoever about 

the value of k. for another record. Let a "file" be a collection of m records. 
J 

Let there be a special file called the "master file" which consists of all 
records of interest. 

Let a "request" R. be a function which maps the integer contents of a 
record into the values "true" and "false". Let a "strip file" S be a file 
produced as follows: The master file is read record by record. As each 



- 1 - 



535291 



record is read, the request R. is applied to the integer contents of that 
record. If and only if the value of R is "true", the record is copied to 
the strip file S . 

Further, let the data base at any moment consist of the master file and 

p strip files S, S ,...,S , produced from requests R , . . . ,R. , . . , ,R , 

respectively. 

Now, a new request R ^ is entered into the system. If another request, 

say R , can be found such that R , implies R , then it must be true that 

all of the records in the master file which would cause R ,, to be true are 

p+1 

also in S . In other words, the records in S , , will be a subset of the 
a p+1 

records in S . In this case, there is probably an advantage in creating 

3. 

S ,T from S rather than from the master file. 
p+1 a 

Procedure : Determining whether some request R implies some other 
request R, can be accomplished by a technique called "recursive residual 
reduction". First let us consider requests formed in terms of logical vari- 
ables (i.e., those taking on only the values "true" and "false" (abbreviated 
as 1 and respectively) ; three Boolean operators * ("and") , + ("inclusive or") , 
and ' ("not"); and parentheses; all with their customary meanings. (We assume 
that * takes precedence over + in the hierarchy of operators.) 

We will use the capital letters A-H as logical variables. Consider two 
requests: 



R, = A+B 



R^ = A*(C+B)' 



- 2 - 



R is true if either A or B (or both) is true, and R„ is true if A is true, 
but neither B nor C is true. R- implies R, , since R^ is always true if R^ 

is true. This fact can be determined by a simple procedure, based on the 

2 
"Expansion Theorem" , which states that f(x- x . , . . . ,x ) = x .*f (x^ , . . . ,1 , . . 

X '*f (x , . . . ,0, . . . ,x ) where f is any Boolean expression in logical variables 

x. l,...,x . The expressions f (x. , . . . ,1, . . . ,x ) and f(x. ,...,0 x ) 

are called the "residue about x " and the "residue about x'" respectively. 

Implication can be determined by the following recursive procedure: 

Let U and V be two Boolean expressions. 

1. U does not imply V if U = 1 and V = 0. 

2. U implies VisU=O^V=lor both. 

3. If neither rule 1 nor rule 2 applies, choose a logical variable, say 
Q, appearing in either U, V, or both. (Call this variable the "pivot vari- 
able".) Take residues of U and V about Q and Q'. U implies V if and only if 
the residue of U about Q implies the residue of V about Q and the residue 

of U about Q' implies the residue of V about Q'. 

Reducing a residue to simpler form is achieved by applying the following 
rules: Let E be any logical expression (i.e., a logical variable, a constant 
or 1, or an expression enclosed in parentheses. 

a) 1 + E = E + 1 = 1 

b) 1*E = E*1 = E 

c) + E = E = = E 

d) 0*E = E*0 = 

e) 0* = 1 

f) 1' = 

- 3 - 



g) parentheses should be removed where not needed for operations hier- 
archy, i.e., from any single logical variable or constant; or from a complete 
expression where neither preceded nor followed by an operator. 

Let us go back to R, = A + B 

R^ = A*(C+B) ' 

and apply our rules to see whether R„ implies R, . 

First, we see that neither rule 1 nor rule 2 applies directly, so in- 
voking rule 3 and choosing A as pivot variable, we have 

Residue about A Residue about A' 

R^ 1 + B = 1 + B = B 

Rj 1 *CC+B)' = (C+B)' *CC+B)' = 



By rule 2, it will be seen that the residue of R- about A implies the residue 
of R^ about A (because the residue of R^ about A is 1) ; and the residue of R 
about A' implies the residue of R^ about A' (because the residue of R„ about 
A' is 0) . Therefore R^ implies R, . 

In practice we would probably not represent the logical expressions sym- 
bolically, but rather as a "Polish" string of operators and operands with all 

3 
negations resolved by De Morgan's theorem. 

Now let us return to our original problem of two personnel files. We 

must extend our language to include the relational operators = , ?',>,>, <, 

^, . Suppose that integers k and k. represent age as of January 1 and days 

absent last year respectively. The file owned by the Operations Research 

Group was produced by the request R^ = (k.Z-60) + (k 2llO). The file desired 



- A - 



by the personnel department could be described by R, = (k Z 63) . Our first 
step is to create logical variables corresponding to each relational opera- 
tion. Let: 

A = (k.i:60) 

B = (k.^lO) 

C = (k.2 63). 
That we are permitted to do this is clear, since a relational operation is, 
for any record, either true or false. Our requests are now 

R- = A + B 

R, = C 
4 

But we know that both A and C are derived from the same integer, namely, k.. 
In fact, if C is true, A must also be true. Therefore, C implies A. Recog- 
nizing this fact makes us treat the Expansion Theorem more generally: Any 
residue about C will substitute 1 for both A and C. Any residue about A' 
will substitute for both A and C. 

The reasoning behind this is clear. The residue of U about X is simply 
the form of U, remaining if X is assumed true. But if X implies Y, then the 
residue of U about X must also assume Y to be true. Similarly, the residue 
of U about Y' must assume X to be false. 

Let us now try the generalized procedure on R- and R , , remembering that 
C implies A. 

R3 A + B 

R, C 

4 

Neither rule 1 nor rule 2 applies so applying rule 3 and choosing B as 
pivot variable, we have 

- 5 - 



Residue about B Residue about B' 

R^ A+l = l A+0 = A 

R4 C C 

The residue of R, about B implies the residue of R- about B by rule 2, but 
neither rule 1 nor rule 2 applies to the residues about B', so we invoke rule 
3, choosing C as pivot variable: 

Residue about C Residue about C' 

Residue of R about B' A = 1 (by C implies A) A 
Residue of R. about B' 1 

Both residues of R, imply their corresponding residues of R_, therefore 
R, implies R-. 

Extensions : The general procedure descirbed herein is capable of a 
great many extensions: 

1. File structure — Our definition of a master file is phrased so 
that it includes files stored on direct-access devices as well as sequential 
devices. The notion of a strip file is easily generalized then to a list 

of those records stored in a direct-access master file which meet some cri- 
terion (request) . 

2. Other Relationships — Equivalence, negation, contraimplication, etc., 

4 
can be determined by simple modifications of the procedure described here. 

3. Other Operations — All logical functions of two variables (there 

are at most 16) can be easily handled by appropriate additions to rules a) - g) ^ 
Moreover, one practical situation which we have recently faced required the 
definition of n-ary equivalents of "or" and "and". Extension of this tech- 
nique to n-ary operations is straight forward. 

- 6 - 



4. Finding the shortest qualifying strip file — If we order strip 
files S^...S in ascending order of length, then iteratively apply this 
technique to each R. in turn with respect to R . ■. . the first R implied by 
R is the shortest qualifying file. If none qualify, the master file 
must be used. 



Conclusion 

The "Expansion Theorem" is a powerful tool in the analysis of the con- 
tents of data files. 



- 7 - 



REFERENCES 



1. See Sprague, C.R. , "On Efficient Multiple-User Date Retrieval From 
Serial Files", unpublished S.M. Thesis, M.I.T., 1964. 



2. S.H. Caldwell, Switching Circuits and Logical Design , Wiley, New York, 
1958, Chapter III, Theorem 20. 



3. Sprague, op. cit. 

4. Sprague, op. cit.. Chapter IV. Section B. 

5. A paper is forthcoming. 



- 8 - 



ti ^7" ^--^ *-•'=■" J.- 



Date Due 



JUL 6 "^^ 



APR 



AG 1 5 '«« 




L^iil.70 



3 TOaO 003 702 237 



^'^3 



-70 



' 3 loan 003 7DE Ell 



Lj^'J'lOA 



^OaO 003 b71 EDO 



M T 



f 



'?060 003 b71 B42 



III 11111111111,, .«;• 



'^060 




003 




t?2 



^la 



M 



ill '7^ 



T 



MIT 1.IBBABIES 



^Nl 



^70 



3 TOaO 003 b71 356 



MIT LIBRASIES OUPL 



11 III ill nil III II 

3 TOAD 0D3 7DS 3M4 



ijijg'T^ 



3 TDflD D 



Mr ^IBftAftieS OUPL 



3 70E 31 



V'/<?-'7c^ 



3 Toao 



Mn LISMRIES OuPL 



lIi[|l|ilMI!||!IIM 

3 b71 3DT 



llli|i 



MH aeftARiES ^^^'- 



TOfi 







^/y)-7(i 



t/$/-7P 



03 b71 3E5 



L/?2-7c) 



3 TOfiO 003 b71 333